#HIDDEN
from IPython.display import Image
from IPython.core.display import HTML
from ipywidgets import Text, Dropdown, Output, interact
from matplotlib import pyplot as plt
from pandas import DataFrame, merge, read_csv, Series, to_datetime
import cufflinks as cf
import qgrid
cf.go_offline()
# cf.set_config_file(offline=False, world_readable=True)
cf.set_config_file(offline=True, world_readable=True)
###  nbi:hide_in

MetaG Sequencing and Alignment Summaries

Full mapping script

Adapter Trimming and QC (Trimmomatic) Report

What is the abundance and quality of the reads in each sample?

Step 1. Split JGI files in PE files

split-paired-reads.py --gzip -1 \$sample.fastq.pe1.gz -2 \$sample.fastq.pe2.gz \$jgi_combined_fastq

Step 2. Use trimmomatic to QC reads
trimmomatic PE -phred33 \$sample.fastq.pe1.gz \$sample.fastq.pe2.gz \$sample.fastq.pe1.gz \$sample.fastq.se1.gz \$sample.fastq.pe2.gz \$sample.fastq.se2.gz \
   ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:8:TRUE \
   LEADING:3 \
   TRAILING:3 \
   SLIDINGWINDOW:4:15 \
   MINLEN:36

Go back to the Outline

Or skip to:
#HIDDEN
# <!-- # #HIDDEN
# # tImages = {"MetaG":Image("images/TrimReport_MetaG.png"),"MetaT":Image("images/MetaT_trimmomatic_plot.png")}
# # @interact
# # def trimReport(Source=["MetaG","MetaT"]):return tImages[Source] 
# <img src="images/TrimReport_MetaG.png"></img>
# -->
HTML("html/MetaG_TrimStats.html")

Host Plant Alignment Report

When the reads are aligned to plant reference genomes, how many of the reads are plant-aligned reads?

Step 1. Align reads to respective host reference assembly

if [[ \$sample == *"G5"* ]]; then
       SAMFILE=\$sample.SWGRASS.sam
       BAMFILE=\$sample.SWGRASS.bam
       bowtie2 --threads \$THREADS -x \$SWITCHGRASS
             -1 \$sample.fastq.pe1.gz -2 \$sample.fastq.pe2.gz
             -U \$sample.fastq.se12.gz -S \$sample.SWGRASS.sam 2>\$sample.stat
else
       SAMFILE=\$sample.MISCANTS.sam
       BAMFILE=\$sample.MISCANTS.bam
       bowtie2 --threads \$THREADS -x \$MISCANTHUS
             -1 \$sample.fastq.pe1.gz -2 \$sample.fastq.pe2.gz
             -U \$sample.fastq.se12.gz -S \$sample.MISCANTS.sam 2>\$sample.stat
fi

#HIDDEN
HTML(filename="html/BothPerc.html")

Sequencing and alignment Summaries

Fungal Alignment Report

What reads are fungal reads?


Step 1. Extract reads that don't align the plant assembly

# A. Convert sam > bam
samtools view -bS \$SAMFILE > \$BAMFILE

# B. Extract all unmapped reads (-f 4) that don't have a mate mapped (-F 256 i.e. both unmapped)
samtools view -b -f 4 -F 256 \$BAMFILE > \$sample.unmapped.bam

# C. Sort the reads
samtools sort -n \$sample.unmapped.bam -o \$sample.unmapped_sorted.bam

# D. Split the unmapped reads in R1 & R2 (back into paired end)
bedtools bamtofastq -i \$sample.unmapped_sorted.bam -fq \$sample.R1.fastq -fq2 \$sample.R2.fastq

Step 2. Align the reads to the combined fungal assemblies
bowtie2 -x \$FUNGAL -1 \$sample.R1.fastq -2 \$sample.R2.fastq -S \$sample.fungal.sam 2>\$sample.fungal.stat

#HIDDEN
HTML(filename='html/FungalAlign.html')

Sequencing and alignment Conclusions

Sequencing Observations

What patterns are there over the season?
  1. The percentage of plant reads sequenced is high thoughout the season and tapers off towards the end of the season. This may be do to senescence of plant cells.
  2. Alignment to the 6 selected fungal assemblies is low, but picks up during the warmer months. Overall, reads aligning to the combined fungal assembly were <10% of the non-plant reads.

Annotation Summary

The annotations were performed using KEGG's prokaryotic peptides and eukaryotic peptides

Based on the high-overlap, we decided to use the mags instead of the assembly for alignment

MetaG Analysis

MAG Stats

MAGs generated with reads that did not align to the fungal/host assemblies using metabat. Stats were then generated with checkm and the command below, results for that stats are below that. checkm lineage_wf -t 20 -x fa Final.contigs.fa.metabat-bins20 metaBinsStats
#HIDDEN
HTML(filename="html/MAG_Stats.html")

Table of the bins with ≥ 50% completeness

Bin Id Marker lineage Unnamed: 2 # Genomes # Markers # Marker sets 0 1 2 3 4 5+ Completeness Contamination Strain heterogeneity
0 bin.7 o__Burkholderiales (UID4105) 54 553 264 3 534 16 0 0 0 99.66 2.65 43.75
1 bin.39 f__Enterobacteriaceae (UID5054) 223 874 303 38 829 7 0 0 0 99.05 0.70 71.43
2 bin.67 o__Pseudomonadales (UID4488) 185 812 308 35 751 26 0 0 0 97.79 3.53 46.15
3 bin.157 o__Actinomycetales (UID1593) 69 400 198 21 364 14 1 0 0 96.84 4.73 29.41
4 bin.125 o__Actinomycetales (UID1663) 488 310 185 18 276 16 0 0 0 93.96 4.23 68.75
5 bin.164 o__Burkholderiales (UID4000) 193 427 214 44 379 4 0 0 0 92.27 1.64 50.00
6 bin.117 o__Rhizobiales (UID3654) 92 481 319 66 401 14 0 0 0 90.51 3.00 78.57
7 bin.202 o__Cytophagales (UID2936) 47 454 336 82 362 10 0 0 0 89.90 2.68 80.00
8 bin.181 o__Rhizobiales (UID3654) 92 481 319 73 381 27 0 0 0 89.84 5.71 33.33
9 bin.188 o__Burkholderiales (UID4000) 193 427 214 77 296 51 3 0 0 87.56 15.48 60.00
10 bin.198 o__Cytophagales (UID2936) 47 454 336 54 379 21 0 0 0 87.28 3.43 61.90
11 bin.121 o__Burkholderiales (UID4000) 193 427 214 68 339 20 0 0 0 87.25 5.02 30.00
12 bin.95 o__Burkholderiales (UID4002) 107 574 251 67 493 14 0 0 0 87.18 2.41 78.57
13 bin.214 o__Flavobacteriales (UID2815) 123 324 204 67 252 5 0 0 0 87.11 1.49 100.00
14 bin.195 k__Bacteria (UID203) 5449 104 58 25 54 16 8 1 0 82.99 35.50 78.26
15 bin.105 o__Actinomycetales (UID1593) 69 400 198 104 282 14 0 0 0 75.84 4.46 28.57
16 bin.88 k__Bacteria (UID203) 5449 104 58 28 39 22 10 5 0 73.82 38.55 34.15
17 bin.142 c__Deltaproteobacteria (UID3216) 83 247 155 49 194 4 0 0 0 73.12 2.58 25.00
18 bin.148 o__Sphingomonadales (UID3310) 26 569 293 154 398 17 0 0 0 72.70 2.26 17.65
19 bin.112 o__Flavobacteriales (UID2815) 123 324 204 100 214 10 0 0 0 71.14 2.71 80.00
20 bin.91 o__Sphingomonadales (UID3310) 26 569 293 191 261 99 18 0 0 69.76 25.37 22.22
21 bin.206 o__Rickettsiales (UID3809) 83 324 211 86 226 12 0 0 0 69.27 3.82 58.33
22 bin.93 o__Actinomycetales (UID1802) 274 385 212 105 254 25 1 0 0 68.43 8.09 53.57
23 bin.114 o__Actinomycetales (UID1663) 488 310 185 92 215 3 0 0 0 67.03 0.69 33.33
24 bin.104 k__Bacteria (UID203) 5449 103 57 52 38 12 1 0 0 67.01 12.41 46.67
25 bin.134 o__Rhizobiales (UID3654) 92 481 319 183 288 10 0 0 0 66.91 2.00 80.00
26 bin.163 o__Rickettsiales (UID3809) 83 324 211 101 216 7 0 0 0 66.80 2.53 14.29
27 bin.146 o__Actinomycetales (UID1697) 387 330 193 131 190 9 0 0 0 64.35 2.27 0.00
28 bin.205 o__Actinomycetales (UID1697) 387 330 193 125 198 7 0 0 0 63.48 3.11 42.86
29 bin.200 k__Bacteria (UID203) 5449 104 58 59 43 2 0 0 0 62.47 2.59 100.00
30 bin.57 o__Cytophagales (UID2936) 47 454 336 192 256 6 0 0 0 60.73 1.79 50.00
31 bin.10 k__Bacteria (UID203) 5449 104 58 63 41 0 0 0 0 60.69 0.00 0.00
32 bin.208 c__Alphaproteobacteria (UID3305) 564 337 221 134 193 10 0 0 0 60.38 2.96 90.00
33 bin.2 o__Actinomycetales (UID1815) 120 574 266 221 331 22 0 0 0 59.37 3.03 36.36
34 bin.175 o__Burkholderiales (UID4000) 193 427 214 171 246 8 2 0 0 58.90 2.35 50.00
35 bin.116 k__Bacteria (UID203) 5449 104 58 61 31 11 0 1 0 58.79 18.97 100.00
36 bin.66 o__Sphingomonadales (UID3310) 26 569 293 263 287 17 2 0 0 56.39 5.49 52.17
37 bin.139 o__Actinomycetales (UID1593) 69 400 198 165 203 31 0 1 0 56.31 7.46 64.86
38 bin.156 k__Bacteria (UID203) 5449 104 58 60 39 5 0 0 0 56.04 6.19 100.00
39 bin.55 k__Bacteria (UID203) 5449 103 57 65 28 9 1 0 0 56.01 13.16 66.67
40 bin.84 o__Burkholderiales (UID4000) 193 427 214 212 206 9 0 0 0 55.53 2.95 11.11
41 bin.171 k__Bacteria (UID203) 5449 104 58 35 31 24 14 0 0 55.22 22.74 3.03
42 bin.132 f__Rhizobiaceae (UID3564) 78 840 354 352 470 18 0 0 0 52.64 1.16 61.11
43 bin.54 k__Bacteria (UID203) 5449 103 57 69 34 0 0 0 0 52.63 0.00 0.00
44 bin.199 k__Bacteria (UID203) 5449 104 58 65 37 2 0 0 0 52.47 3.45 50.00
45 bin.115 k__Bacteria (UID203) 5449 104 58 43 38 12 11 0 0 52.30 19.17 66.67
46 bin.120 o__Sphingomonadales (UID3310) 26 569 293 274 271 23 1 0 0 51.10 5.80 38.46
47 bin.33 k__Bacteria (UID203) 5449 104 58 68 36 0 0 0 0 50.17 0.00 0.00
48 bin.65 k__Bacteria (UID203) 5449 104 58 64 39 1 0 0 0 49.32 1.72 100.00

MetaT Analysis

Overarching analysis - changes in metagenome content over time

All bins with at least 50% completeness were used in a combined assembly. The analysis below comes from the counts for each sample. These counts were normalized by the number of reads that did not align to the fungal or host assemblies.

COG accumulation over time

pcoa

Is time a factor that affects the functional richness of phylosphere communities (Alpha Diversity)?

Switchgrass Pearson's product-moment correlation
Data: Time and Richness
t = 10.223 df = 62 p-value = 6.316e-15
sample estimates: cor 0.7922484
Switchgrass Pearson's product-moment correlation
Data: Time and Shannon
t = 7.1534 df = 62 p-value = 1.166e-09
sample estimates: cor 0.6724267
Switchgrass Pearson's product-moment correlation
Data: Time and Pielou
t = 5.6951 df = 62 p-value = 3.622e-07
sample estimates: cor 0.586056
Miscanthus Pearson's product-moment correlation
Data: Time and Richness
t = 8.5659 df = 70 p-value = 1.658e-12
sample estimates: cor 0.7153784
Miscanthus Pearson's product-moment correlation
Data: Time and Shannon
t = 6.1267 df = 70 p-value = 4.675e-08
sample estimates: cor 0.5908107
Miscanthus Pearson's product-moment correlation
Data: Time and Pielou
t = 7.1602 df = 70 p-value = 6.369e-10
sample estimates: cor 0.6502079

Do we see changes over time in functional richness?

Call: adonis(formula = dist.otu ~ map_16S\$time_numeric)

Permutation: free Number of permutations: 999

Terms added sequentially (first to last)

Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
map_16S$time_numeric 1 1.9356 1.93562 47.814 0.26298 0.001 ***
Residuals 134 5.4246 0.04048 0.73702
Total 135 7.3603 1.000000

images/Level1_Mags.png

images/Level2_Mags.png

Is there a difference between treatment (Fertilized vs. Unfertilized)?

Call: adonis(formula = dist.otu ~ map_16S\$treatment)

Permutation: free Number of permutations: 999

Terms added sequentially (first to last)

Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
map_16S$treatment 1 0.0289 0.028930 0.52878 0.00393 0.755
Residuals 134 7.3313 0.054711 0.99607
Total 135 7.3603 1.000000

Is there a difference between host plant?

Call: adonis(formula = dist.otu ~ map_16S\$plant)

Permutation: free Number of permutations: 999

Terms added sequentially (first to last)

Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
map_16S$plant 1 0.1617 0.161704 3.0101 0.02197 0.027 *
Residuals 134 7.1986 0.053721 0.97803
Total 135 7.3603 1.000000

Is there similarity between the WGS communities and the 16S data (mantel/procrustes)?

Mantel statistic based on Pearson's product-moment correlation Call: mantel(xdis = genes.dist, ydis = otu.dist) Mantel statistic r: 0.6526 Significance: 0.001 Upper quantiles of permutations (null model): 90% 95% 97.5% 99% 0.0833 0.1194 0.1458 0.1849 Permutation: free Number of permutations: 999 Call: protest(X = genes.pcoa, Y = otu.pcoa) Procrustes Sum of Squares (m12 squared): 0.5402 Correlation in a symmetric Procrustes rotation: 0.6781 Significance: 0.001 Permutation: free Number of permutations: 999

How variable are metagenomes across replicate time points?

 

MetaT Sequencing and Alignment Summaries

Full mapping script

What is the abundance and quality of the reads in each sample?

#HIDDEN
HTML("html/MetaT_TrimStats.html")
MultiQC Report

Loading report..

Highlight Samples

Regex mode off

    Rename Samples

    Click here for bulk input.

    Paste two columns of a tab-delimited table here (eg. from Excel).

    First column should be the old name, second column the new name.

    Regex mode off

      Show / Hide Samples

      Regex mode off

        Export Plots

        px
        px
        X

        Download the raw data used to create the plots in this report below:

        Note that additional data was saved in multiqc_data when this report was generated.


        Choose Plots

        If you use plots from MultiQC in a publication or presentation, please cite:

        MultiQC: Summarize analysis results for multiple tools and samples in a single report
        Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
        Bioinformatics (2016)
        doi: 10.1093/bioinformatics/btw354
        PMID: 27312411

        Save Settings

        You can save the toolbox settings for this report to the browser.


        Load Settings

        Choose a saved report profile from the dropdown box below:

        About MultiQC

        This report was generated using MultiQC, version 1.7

        You can see a YouTube video describing how to use MultiQC reports here: https://youtu.be/qPbIlO_KWN0

        For more information about MultiQC, including other videos and extensive documentation, please visit http://multiqc.info

        You can report bugs, suggest improvements and find the source code for MultiQC on GitHub: https://github.com/ewels/MultiQC

        MultiQC is published in Bioinformatics:

        MultiQC: Summarize analysis results for multiple tools and samples in a single report
        Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
        Bioinformatics (2016)
        doi: 10.1093/bioinformatics/btw354
        PMID: 27312411

        Report generated on 2020-02-25, 12:44 based on data in: /mnt/research/ShadeLab/GLBRC/mapping/metaT/fullAssembly/trimStats


        loading..

        How do the reads align to the Switchgrass assembly?

        #HIDDEN
        HTML("html/MetaT_HostAlignment.html")
        
        MultiQC Report

        Loading report..

        Highlight Samples

        Regex mode off

          Rename Samples

          Click here for bulk input.

          Paste two columns of a tab-delimited table here (eg. from Excel).

          First column should be the old name, second column the new name.

          Regex mode off

            Show / Hide Samples

            Regex mode off

              Export Plots

              px
              px
              X

              Download the raw data used to create the plots in this report below:

              Note that additional data was saved in multiqc_data when this report was generated.


              Choose Plots

              If you use plots from MultiQC in a publication or presentation, please cite:

              MultiQC: Summarize analysis results for multiple tools and samples in a single report
              Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
              Bioinformatics (2016)
              doi: 10.1093/bioinformatics/btw354
              PMID: 27312411

              Save Settings

              You can save the toolbox settings for this report to the browser.


              Load Settings

              Choose a saved report profile from the dropdown box below:

              About MultiQC

              This report was generated using MultiQC, version 1.7

              You can see a YouTube video describing how to use MultiQC reports here: https://youtu.be/qPbIlO_KWN0

              For more information about MultiQC, including other videos and extensive documentation, please visit http://multiqc.info

              You can report bugs, suggest improvements and find the source code for MultiQC on GitHub: https://github.com/ewels/MultiQC

              MultiQC is published in Bioinformatics:

              MultiQC: Summarize analysis results for multiple tools and samples in a single report
              Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
              Bioinformatics (2016)
              doi: 10.1093/bioinformatics/btw354
              PMID: 27312411

              Report regenerated on 2020-02-18, 21:40 based on data in: /mnt/research/ShadeLab/GLBRC/mapping/metaT/hostRemovalFlagstats


              There are 6 possible types of alignment: * PE mapped uniquely: Pair has only one occurence in the reference genome. * PE mapped discordantly uniquely: Pair has only one occurence but not in proper pair. * PE one mate mapped uniquely: One read of a pair has one occurence. * PE multimapped: Pair has multiple occurence. * PE one mate multimapped: One read of a pair has multiple occurence. * PE neither mate aligned: Pair has no occurence.

              loading..

              How do the reads align to the MAG assembly?

               
              

              Outline

              MetaT Analysis

              MetaT Kallisto

              Link to the interactive tables. Static tables shown below.

              MetaT Trim Report

              Link to the interactive metaT trim report.

              Alignment of the MetaT reads to the MAG assembly

              MetaT alignment by count

              MetaT alignment by percentage

              Next Steps

              1. Fix the metadata to ensure that each sample is accounted for and we have the seq files (Nejc)      ✓ Completed Wednesday, Feb 19,2020. Delivered by Kiera
              2. Modify the R analysis script to read results from the MetaT data (Nejc & Shane)

                In order for this to progress, 1 had to be complete. The next step is figure out why the metadata for the MetaT is not meshing with the R script.

              3. Explore the Kallisto output to determine how to extract annotations (Shane)

                Kallisto has an arguement to export a bam file when it runs, but I do not know how these bam files work or if they are the same thing as a bam from bowtie2. I am working with kallisto to get a bam file now and then I will see if that file works to extract annotations from the contigs